In this take-home exercise 5, we will examine and characterize distinct areas of the city of Engagement, Ohio USA.
As part of Challenge 2 of VAST Challenge 2022, we are required to characterize the distinct areas of the city identified. Sub-questions to consider:
● Where are the busiest areas in Engagement?
● Are there traffic bottlenecks that should be addressed?
Before we get started, it is important for us to ensure that the required R packages have been installed. If yes, we will load the R packages. If they have yet to be installed, we will install the R packages and load them onto R environment.
The chunk code below will do the trick.
packages = c('tidyverse', 'lubridate', 'sf','tmap', 'sftime',
'lubridate', 'rmarkdown', 'clock')
for(p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p, character.only = T)
}
In addressing the first sub-questions in identifying which areas are the busiest, we will denote this as areas with high check in rates.
The data that will be used to examine the busiest area are:
● TravelJournal - to obtain the location visited by the participants throughout the study.
● Schools - to get the location of the schools for plotting
● Pubs - to get the location of the pubs for plotting
● Restaurants - to get the location of the restaurants for plotting
● Apartments - to get the location of the apartments for plotting
● Employers - to get the location of the workplaces for plotting
travel <- read_csv("data/TravelJournal.csv")
schools <- read_sf("data/wkt/Schools.csv", options = "GEOM_POSSIBLE_NAMES=location")
pubs <- read_sf("data/wkt/Pubs.csv", options = "GEOM_POSSIBLE_NAMES=location")
apartments <- read_sf("data/wkt/Apartments.csv", options = "GEOM_POSSIBLE_NAMES=location")
buildings <- read_sf("data/wkt/Buildings.csv", options = "GEOM_POSSIBLE_NAMES=location")
employers <- read_sf("data/wkt/Employers.csv", options = "GEOM_POSSIBLE_NAMES=location")
restaurants <- read_sf("data/wkt/Restaurants.csv", options = "GEOM_POSSIBLE_NAMES=location")
Extract the weekday data from the checkInTime. This Weekday data will be used to examine if there is a different pattern in terms of busiest area on the weekend and the weekday.
As the travelStartLocationId and travelEndLocationId both have numeric data type, we will convert them into characters so that it can be later join to the other datasets (e.g. pubs, schools, etc.) in which the data type are in characters. Furthermore, both fields should not be in numeric data type as the numbers do not have any meaning based on the continuous scale but rather used as a reference code to identify distinct places. Additionally, we will shorten the column names of both fields for easy calling.
We will then select the relevant fields needed for our visualization.
travel <- travel %>% mutate(Weekday = wday(checkInTime, label = TRUE))%>%
mutate(venueId = as.character(travelEndLocationId)) %>%
mutate(originId = as.character(travelStartLocationId)) %>%
select(participantId, originId, venueId, checkInTime, Weekday)
Due to the large data found in TravelJournal and the memory limit in github, we will store the compressed version of the data in rds format.
write_rds(travel,
"data/rds/travel.rds")
travel <- read_rds("data/rds/travel.rds")
We will first consider the busiest areas throughout the study. Thus, we will group the travel data by the venueId (which is the travelEndLocationId) and obtain the number of visitors for each venueId. The reason why originId is not considered, is to avoid double counting.
No_visitors <- travel %>% group_by(venueId) %>% summarise(visitors = n())
As the travel data does not have the coordinate points for plotting, we will need to join with the respective location file to get it for plotting.
pub_v <- left_join(pubs, No_visitors, by = c('pubId' = 'venueId'))
restaurants_v <- left_join(restaurants, No_visitors, by = c('restaurantId' = 'venueId'))
employers_v <- left_join(employers, No_visitors, by = c('employerId' = 'venueId'))
schools_v <- left_join(schools, No_visitors, by = c('schoolId' = 'venueId'))
apartments_v <- left_join(apartments, No_visitors, by = c('apartmentId' = 'venueId'))
For the visualization of geo-spatial data, we will use tmap.
tmap_mode("view")
tm_shape(buildings)+
tm_polygons(col = "grey60",
size = 2,
border.col = "black",
border.lwd = 1) +
tm_shape(pub_v) +
tm_bubbles(col = "red",
size = "visitors",
alpha = 0.3) +
tm_shape(restaurants_v) +
tm_bubbles(col = "blue",
size = "visitors",
alpha = 0.3) +
tm_shape(employers_v) +
tm_bubbles(col = "yellow",
size = "visitors",
alpha = 0.3) +
tm_shape(schools_v) +
tm_bubbles(col = "green",
size = "visitors",
alpha = 0.3) +
tm_shape(apartments_v) +
tm_bubbles(col = "pink",
size = "visitors",
alpha = 0.3)
tmap_mode("plot")
Figure 1
With Tmap users are able to zoom in and out of the plots, select and deselect the different location layers for deeper investigation and even hover to see building details.
Based on Figure 2, we notice that the central (green circle) and the north-western area (purple circle) tends to be more pack with pubs and restaurants.
Pubs 1342 and 1344 found within the green circle have 2 of the biggest circle, indicating that these 2 are the hihgly visited pubs in the area. Perhaps, it is due to the centrality of the location, it is much convenient area for the participants to socialise.
RestaurandId 1801, 1805 and 449 are the top 3 restaurant in the city of Engagement, with both restaurant 1801 and 1805 located within the purple circle.
Filter to contain only Saturdays and Sundays data.
Group by the venueId to compute the number of visitors at each venue throughout the weekends.
No_visitors_wEnd <- travel %>%
filter(Weekday == "Sat" |Weekday == "Sun") %>%
group_by(venueId) %>%
summarise(visitors = n())
pub_wEnd <- left_join(pubs, No_visitors_wEnd, by = c('pubId' = 'venueId'))
restaurants_wEnd <- left_join(restaurants, No_visitors_wEnd, by = c('restaurantId' = 'venueId'))
employers_wEnd <- left_join(employers, No_visitors_wEnd, by = c('employerId' = 'venueId'))
schools_wEnd <- left_join(schools, No_visitors_wEnd, by = c('schoolId' = 'venueId'))
tmap_mode("view")
tm_shape(buildings)+
tm_polygons(col = "grey60",
size = 2,
border.col = "black",
border.lwd = 1) +
tm_shape(pub_wEnd) +
tm_bubbles(col = "red",
size = "visitors",
alpha = 0.3) +
tm_shape(restaurants_wEnd) +
tm_bubbles(col = "blue",
size = "visitors",
alpha = 0.3) +
tm_shape(employers_wEnd) +
tm_bubbles(col = "yellow",
size = "visitors",
alpha = 0.3) +
tm_shape(schools_wEnd) +
tm_bubbles(col = "green",
size = "visitors",
alpha = 0.3)
tmap_mode("plot")
Figure 3
Filter to exclude Saturdays and Sundays data.
Group by the venueId to compute the number of visitors at each venue throughout the weekdays.
pub_wDay <- left_join(pubs, No_visitors_wDay, by = c('pubId' = 'venueId'))
restaurants_wDay <- left_join(restaurants, No_visitors_wDay, by = c('restaurantId' = 'venueId'))
employers_wDay <- left_join(employers, No_visitors_wDay, by = c('employerId' = 'venueId'))
schools_wDay <- left_join(schools, No_visitors_wDay, by = c('schoolId' = 'venueId'))
tmap_mode("view")
tm_shape(buildings)+
tm_polygons(col = "grey60",
size = 2,
border.col = "black",
border.lwd = 1) +
tm_shape(pub_wDay) +
tm_bubbles(col = "red",
size = "visitors",
alpha = 0.3) +
tm_shape(restaurants_wDay) +
tm_bubbles(col = "blue",
size = "visitors",
alpha = 0.3) +
tm_shape(employers_wDay) +
tm_bubbles(col = "yellow",
size = "visitors",
alpha = 0.3) +
tm_shape(schools_wDay) +
tm_bubbles(col = "green",
size = "visitors",
alpha = 0.3)
tmap_mode("plot")
Figure 4
We observe that the yellow circles representing employers in more prominenet in Figure 4 than in Figure 3, indicating that there are lesser participants working on the weekends.
logs <- read_sf("data/wkt/ParticipantStatusLogs1.csv", options = "GEOM_POSSIBLE_NAMES=currentLocation")
logs_selected <- logs %>% mutate(Timestamp = date_time_parse(timestamp,
zone = "",
format = "%Y-%m-%dT%H:%M:%S")) %>%
mutate(day = get_day(Timestamp)) %>%
filter(currentMode == "Transport")
To obtain the path traveled by each participant for each day, we will
need to group_by the paticipantId and day
followed by summarizing it by their mean timestamp and to cast it into a
LINESTRING format.
logs_path <- logs_selected %>%
group_by(participantId, day) %>%
summarize(m = mean(Timestamp),
do_union=FALSE) %>%
st_cast("LINESTRING")
logs_path_selected <- logs_path %>%
filter(day == 2)
tmap_mode("view")
tm_shape(buildings)+
tm_polygons(col = "grey60",
size = 1,
border.col = "black",
border.lwd = 1) +
tm_shape(logs_path_selected) +
tm_lines(col = "participantId") +
tm_layout(legend.show=FALSE)
tmap_mode("plot")
Figure 5
The combined thickness of all the lines depicts the commonly used routes by the participants.